Learning Translations for Tagged Words: Extending the Translation Lexicon of an ITG for Low Resource Languages
نویسندگان
چکیده
We tackle the challenge of learning part-ofspeech classified translations as part of an inversion transduction grammar, by learning translations for English words with known part-of-speech tags, both from existing translation lexica and from parallel corpora. When translating from a low resource language into English, we can expect to have rich resources for English, such as treebanks, and small amounts of bilingual resources, such as translation lexica and parallel corpora. We solve the problem of integrating these heterogeneous resources into a single model using stochastic Inversion Transduction Grammars, which we augment with wildcards to handle unknown translations.
منابع مشابه
Enrichment of a Bilingual Lexicon by Analogical Learning
Unknown words are a well-known hindrance to natural language applications. In particular, they drastically impact machine translation quality. An easy way out commercial translation systems usually offer their users is the possibility to add unknown words and their translations into a dedicated lexicon. Recently, (Stroppa et Yvon, 2005) shown how analogical learning alone deals nicely with morp...
متن کاملSupervised Bilingual Lexicon Induction with Multiple Monolingual Signals
Prior research into learning translations from source and target language monolingual texts has treated the task as an unsupervised learning problem. Although many techniques take advantage of a seed bilingual lexicon, this work is the first to use that data for supervised learning to combine a diverse set of signals derived from a pair of monolingual corpora into a single discriminative model....
متن کاملTranslating Unknown Words by Analogical Learning
Unknown words are a well-known hindrance to natural language applications. In particular, they drastically impact machine translation quality. An easy way out commercial translation systems usually offer their users is the possibility to add unknown words and their translations into a dedicated lexicon. Recently, Stroppa and Yvon (2005) have shown how analogical learning alone deals nicely with...
متن کاملA Comprehensive Analysis of Bilingual Lexicon Induction
Bilingual lexicon induction is the task of inducing word translations from monolingual corpora in two languages. In this paper we present the most comprehensive analysis of bilingual lexicon induction to date. We present experiments on a wide range of languages and data sizes. We examine translation into English from 25 foreign languages: Albanian, Azeri, Bengali, Bosnian, Bulgarian, Cebuano, G...
متن کاملImproving word alignment for low resource languages using English monolingual SRL
We introduce a new statistical machine translation approach specifically geared to learning translation from low resource languages, that exploits monolingual English semantic parsing to bias inversion transduction grammar (ITG) induction. We show that in contrast to conventional statistical machine translation (SMT) training methods, which rely heavily on phrase memorization, our approach focu...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016